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^ ; Abstract 

> : 

Importance sampling and Metropolis-Hastings sampling (of which Gibbs sampling is 
a special case) are two methods commonly used to sample multi-variate probabil- 
ity distributions (that is, Bayesian networks). Heretofore, the sampling of Bayesian 
t— i ■ networks has been done on a conventional "classical computer". In this paper, we 

propose methods for doing importance sampling and Metropolis-Hastings sampling 
O ' °f a classical Bayesian network on a quantum computer. 

1 Introduction 

Monte Carlo methods are frequently used to sample probability distributions. For 
a single random variable, it is common to draw samples using the inverse transform 
methodpQ or the ARM (acceptance-rejection method) [2]. For an n-tuple of dependent 
random variables (i.e., a Bayesian network), it is common to use importance sampling 
(see Appendix [A}, Gibbs sampling (see Appendix IB. lj) and Metropolis-Hastings sam- 
pling (see Appendix IB. 21) . Two special cases of importance sampling are rejection 
sampling and likelihood weighted sampling (a.k.a. likelihood weighting). 

In previous papers written by me, I define some nets that describe quantum 
phenomena. I call them "quantum Bayesian nets"(QB nets). They are a counter- 



ed 



1 



part to the conventional "classical Bayesian nets" (CB nets)[3j that describe classical 
phenomena. 

Heretofore, the sampling of CB nets has been done on a conventional "classical 
computer" . In this paper, we advocate sampling a CB net with a quantum computer. 

In Ref. [I], we proposed a method for "embedding" a CB net within a QB net. 
By applying this embedding technique, we were able to obtain in Ref. [5] a method 
of doing both rejection sampling and likelihood weighted sampling of a CB net on a 
quantum computer. In Ref. [5], we illustrated our technique by applying it to a special 
CB net used in medical diagnosis, the QMR (Quick Medical Reference) CB net. 

In this paper, we generalize the results of Ref. [5] to include all kinds of im- 
portance sampling, (not just rejection and likelihood weighted sampling). We also 
show how to do Gibbs sampling and Metropolis-Hastings sampling of a CB net with 
a quantum computer. 

Other workers [61 [71 El E] have considered sampling of a probability distribution 
using a quantum computer. Their methods are very different from ours. Contrary 
to them, we utilize a general technique, first proposed in Ref. [I], for embedding CB 
nets within QB nets. We leave to future work a deeper, more detailed comparison 
between their methods and ours. 

2 Notation and Preliminaries 

In this section, we will define some notation that is used throughout this paper. For 
additional information about our notation, we recommend that the reader consult 
Ref. [TD] • Ref. [10] is a review article, written by the author of this paper, which uses 
the same notation as this paper. 

We will often use the symbol Nb for the number (> 1) of qubits and N$ = 2 B 
for the number of states with Nb qubits. The quantum computing literature often 
uses n for Nb and N for Ns, but we will avoid this notation. We prefer to use n for 
the number operator, defined below. 

Let Bool = {0, 1}. As usual, let Z, IR, C represent the set of integers (negative 
and non-negative), real numbers, and complex numbers, respectively. For integers a, 
b such that a < b, let Z a ^ = {a, a + 1, ... b — 1, b}. For any positive integer k and any 
set S, let S k denote the Cartesian product of k copies of S; i.e., the set of all fc-tuples 
of elements of S. For any set S, let \S\ be the number of elements in 5*. 

We will use Q(S) to represent the "truth function"; Q(S) equals 1 if statement 
5* is true and if S is false. For example, the Kronecker delta function is defined by 
8% = 5(x,y)_= e(x = y). 

Let = 1 and 1 = 0. If a = a^ B -\ ■ ■ ■ a^aiao, where a M G Bool, then dec(a) = 
X^o" 1 2 Ma At = a - Conversely, a = bin(a). 

We define the single-qubit states |0) and |1) by 
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If a e Bool B , we define the iV^-qubit state \a) as the following tensor product 



\ a N B -l) 



■ ■ \ai) 



(2) 



For example, 



|01) 



" 1 " 




" " 




1 







1 












(3) 



When we write a matrix, and leave some of its entries blank, those blank 
entries should be interpreted as zeros. 

I k and 0^ will represent the k x k unit and zero matrices, respectively. For any 
matrix A E C pxq , A* will stand for its complex conjugate, A T for its transpose, and 
A} for its Hermitian conjugate. 

For any matrix A and positive integer k, let 



A® k = A 



A® A 



v 

k copies of A 



A® k = A 



A® A . 



k copies of A 

Suppose (5 G Z 0t N B ^i and M is any 2x2 matrix. We define M(f3) by 



M{(5) 



I 2 ® M ® I 2 ® 



(4) 
(5) 

(6) 



where the matrix M on the right hand side is located at qubit position (3 in the tensor 
product of N B 2x2 matrices. The numbers that label qubit positions in the tensor 
product increase from right to left (•<—), and the rightmost qubit is taken to be at 
position 0. 

The Pauli matrices are 



ox 



1 

1 



cry 



-i 

1 



1 
-1 



(7) 



Let a = (o- x , &y, &z)- For any a G M 3 , let ag — a • a. 

The one-qubit Hadamard matrix H is defined as: 



V2 



1 1 
1 -1 



(8) 



The iV s -qubit Hadamard matrix is defined as H® Nb . 
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The number operator n for a single qubit is defined by 



n 




1 



1 - a z 



Note that 



n\0) = 0|0) = , n\l) = 1 |1) . 
We will often use n as shorthand for 



n = 1 — n = 



1 




1 + °Z 



Define P and Pi by 



Po = n 



1 





1 



= |0> (0| , P 1 = n = 
P and Pi are orthogonal projection operators and they add to 

P a Pb = 5(a, b)Pb for a,b e -Boo/ , 



P + Pi = /: 



2 • 



For a e BooI Nb , let 



P- = P 



a ajv R -i 



• • • <g> P a2 <g> P ai <g> P ao 



For example, with 2 qubits we have 

P 00 = Po®Po = g^(1, 0,0,0), 

Poi = P ® Pi = ^(0,1,0,0) , 

P 10 = Pi <g> P = diag(0, 0, 1, 0) , 



Note that 



Pn = Pi (8) Pi = diag(0, 0, 0, 1) 



P s Ps = 6{a, b)P t for a,b G Poo/^ , 



P d = I 2 ®I 2 ®-..®I 2 = L 



2«s 



deBooi N B 



Next we explain our circuit diagram notation. We label single qubits (or qubit 
positions) by a Greek letter or by an integer. When we use integers, the topmost qubit 
wire is 0, the next one down is 1, then 2, etc. Note that in our circuit diagrams, time 
flows from the right to the left of the diagram. Careful: Many workers in Quantum 
Computing draw their diagrams so that time flows from left to right. We eschew 
their convention because it forces one to reverse the order of the operators every time 
one wishes to convert between a circuit diagram and its algebraic equivalent in Dirac 
notation. 

Suppose U G U(2). If r and k are two different qubit positions, gate U(r) n!yK ^ 
(or [/(r)™( K )) is called a controlled U with target r and control K. When U = ax, 
this reduces to a CNOT (controlled NOT). If t,K\ and kq are 3 different qubit 
positions, <7x('r) n<Kl < |ra(K0 ' ) is called a Toffoli gate with target r and controls Ki, kq. 
Suppose Nk > 2 is an integer and b G BooI Nk . Suppose r, Kn k -i, ^n k -2, ■ ■ ■ ,K\,Ko 
are distinct qubits and k = (k Nk _i, k Nk _ 2 , . . . , k 1 , k ). Gate U(r) p b^ is called a 
multiply controlled U with target r and Nk controls K. When U = ax, this 
reduces to an MCNOT (multiply controlled NOT). 

For any set Q and any function / : Q — > C, we will use f(x)/(^2 xen num), 
where "num" stands for numerator, to mean /(aO/(XLefi f( x ))- This notation is 
convenient when f(x) is a long expression that we do not wish to write twice. 

Consider an n-tuple / = (fi, f 2 , . . . , /„), and a set A C Z l n . By (j)a we will 
mean (fi)i£A j that is, the | A | -tuple that one creates from /, by keeping only the 
components listed in A. 

Symbols which represent random variables will be underlined. The set of 
values (or states) that a random variable x can assume will be denoted by val(x) ( 
or Sx). Samples of x will be denoted by x^ for k e Z 1>Nsam . 

Next, consider a CB net with nodes x^a^ • • • i-H±N nis - 

We will use pa(i) (ch(i), respectively) to denote the set of all j G Zi,jv nda such 
that Xj is a parent (child, respectively) of x^ Suppose 7 = pa, ch. Let 7(2^) = {x_j : 
j G 7 (z)}. Let 7 (5) = U i6S 7(i). 

The Markov blanket of x { is defined by 

MB(i) = pa{i) U ch{i) U pa(ch(i)) . (22) 
Let {i} c = Z\^ nis — {i}- One an prove that 

P(Xi\(x) {i y) = P(Xi\(x) M B(i)) ■ (23) 

We won't prove Eq. ff23l here, but next we will give an example to make it plausible. 
For the CB net shown in FigJH one has 
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Figure 1: The Markov blanket of node x is the set of all nodes inside the large circle, 
excluding x. 



P(x\x', x", a, a', a", b) 



P{x\x')P{x'\x")P{x")P{a\a',x)P{a'\a")P{a")P{b\a) 



J2 X num 



P(x\x')P(a\a', x) 
P(x\x' a, a') . 



(24) 



3 Multiplexors 

In this section, we discuss some multi-qubit transformations called multiplexors. 

Suppose that U is an N x N unitary matrix, where N is an even number. The 
Cosine-Sine Decomposition (CSD) Theorem|ll] states^ that one can always 
express U in the form 



U 



L 
Lx 



D 



Ro o 

i?i 



(25a) 



where the left and right matrices Lq, Li, Rq, R\ are y x T unitary matrices, and 



D 



Doo = D n = diag(Cx, C 2 ,...,C 



(25b) 
(25c) 



Dqi = diag(Si, S2, ■ ■ ■ , Sn) , D 



'10 



-D 



01 



(25d) 



1 Actually, this is only a special case of the CSD Theorem — the case which is most relevant to 
quantum computing. The general version of the CSD Theorem does not restrict the dimension of U 
to be even, or even restrict the blocks into which U is partitioned to be of equal size. 
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For all % G Z x n , Ci = cos 0$ and = sin 8i for some angle ^ . Eqs. (1251) can be 
expressed more succinctly as 



(26) 



U = (L © LxY^^iRo © iq) , 

where 9 = diag(9i, 2 , ■ ■ ■ ,0n). 

We will henceforth refer to Ref.[T2"] as Tuc99. Tuc99 was the first paper to 
use the CSD to compile unitary matrices. By "compiling a unitary matrix", we 
mean decomposing it into a SEO (Sequence of Elementary Operators), elementary 
operators such as single-qubit rotations and CNOTs. 

Note that for some 6? G R and iV = 2 Nb , matrix D of Eq.(J25J can be expressed 

as 



D 



e^ aY <g> 



beBool N B- 



n 



(27a) 
(27b) 
(27c) 



beBooi N B- 



b ) with 

/ AT B -2,...,1,0 



To prove that Eqs. f)27al) . fl27b|) . and fl27c[) are equivalent, just apply 

6 G BooI Nb ~ 1 to the right hand side of each line, and use the fact that 

(Note that we can "pull the b sum" out of the argument of the exponential only if we 
also pull out the ®Pg.) 

In Tuc99, I refer to matrices of the form of the D matrix of Eq. (125ft simply 
as "D-matrices" . In my papers that followed Tuc99, I've begun calling such matrices 
"multiplexors" H When I want to be more precise, I call the D matrix of Eq. (1251) . an 
i? ?/ (2)-multiplexor with target qubit N B — 1 and control qubits N B — 2, . . . , 2, 1, 0. The 
Ry{2) term refers to the fact that the set of operations acting on the target qubit are 
2x2 qubit rotations R y (4>) = for some <p G R. More generally, one can speak 
of [/(iV)-multiplexors. Henceforth in this paper, I'll continue using this multiplexor 
nomenclature, even though it's not used in Tuc99. 

Tuc99 gives identities for decomposing an arbitrary R y {2) -multiplexor with 
Nb — 1 controls into a SEO with 2 Nb ~ 1 CNOTs. Figj2] shows an example of the 
SEO decomposition found in Tuc99 for an i? y (2)-multiplexor. In Figj2j 0,1,2,3 are 
the control qubits, and 4 is the target qubit. The empty square vertices represent 



2 "multiplexor" means "multi-fold" in Latin. A special type of electronic device is also called a 
multiplexor or multiplexer. 
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a ■ 
(>=• 
o 

□ i — d^-i — i — tski — akj — i-^j — i — l^-i — i^kj — is^i — Likj — — Likj — i — i^kj — Likj — isl* 
I i7V1 ivvl iTVl r7VI rTVl rTSl iTVl r/vl r7sl I7V1 iTVl rTVl r/vl r/vl rTVl rTv^ 

Figure 2: A possible decomposition of an /^^-multiplexor with 4 controls. 



R y {2) gates. The symbol to the left of the equal sign, the one with the "half- moon" 
vertices, was invented by the authors of Ref . PS] to represent a multiplexor. 

U(N) multiplexors for any N > 2 satisfy certain simple properties that we 
shall discuss next. 

Let "con" stand for control and "tar" for target. 

Suppose b e BooI Nb ' cou and {t/g} v j is a family of 2 NB > tar x 2 Nb ^ t unitary 
matrices. Define 

= Y, u i® p i- ( 28 ) 

b 

Suppose b E Bool NB < con \ V E Bool NB ' con2 , and {t%,} v ^, is a family of 2 JV ».««' x 
2 N B,tar unitary matrices. Define 



(29) 



Claim llfbE BooI Nb > co " and {£%} v g, {Vg} v g are two families of 2 Nb ^ x 2 Nb ^ 
unitary matrices, then 

HUb] EM = EftUfffl . (30) 

The last equation can be represented in circuit notation. For example, when N B>con = 2 
and N B tar = 2, one writes 



o— — c— 



<- _ -j- 



(31) 



S 



proof: Obvious. 
QED 

Claim 2 If be Bool NB ' conl , b' e Bool NB ' con2 , and {U^,}^,, is a family of2 Ns ^ x 
2 N B,tar unitary matrices, then 



(32) 



The last equation can be represented in circuit notation. For example, when Nb, C oui 
1, N B „ on2 = 2, and N BMr = 2, one writes 




(33) 



proof: Obvious. 
QED 

We end this section by proving a "chain rule" for R y {2) multiplexors (similar to 
the Chapman-Kolgomorov chain rule for conditional probabilities). A result similar 
to the next claim is given in Ref.|13j. 

Below, when an index is replaced by a dot, we mean that the index is summed 
over all its possible values. For example, q..k = Z^ij Qijk- 

Claim 3 Suppose b = (6jy fl _i, . . . , b\, bo) G BooI Nb , q^ > 0, J2b1b = 1 Assume 
Ng = 3 for definiteness. One has 



b^Bool 3 



u\oy 



if 



(34a) 



U 



I 2 ® e l ^o %^®^ol [If 2 e WaY ] , (34b) 



and the angles dbib , @b and 9 are defined by 

1 



(C&l&o) S^bo) 

(Cfeo; ^b ) 



y/1-biba 
1 



lbo 



(35a) 
(35b) 
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(C,S) 



(35c) 



d 5 7 = sin(6' 7 ) /or any symbol^ (including no symbol). Eq.ffifl) 



where C 1 = cos(6 l 7 ) an 
can be represented as a circuit diagram, as follows: 



v% 

b&Bool 3 



10} 
10} 
10} 



proof: 



(6| P h > = 6% (b\ for 6, b' G Bool. Thus 



(& 2 A,& |t/|0} c 



{b 2 \ |0) (h\ e irTY S \0) <6 | e i<Ty " |0) 



Let 



P(b 2 \b ll b ) = C 2 b b i iXl 



b ' 



Then 



95 



P(h\bo) = Cfsj 1 , 



p(bo) = C 2bo S 2bo . 



|(&2,&1,&O|^|O} 03 | 2 

P(b 2 \b 1 ,b )P(b 1 \b )P(b ) 

P(b). 



Eqs.f l39|) are satisfied if 



\ / Qb 2 b 1 b 
y/Q-b^o 



rib 2 qb 2 
^b 1 b J b 1 b ' 



(~ib\ qbi 
^bo^bo > 



The above claim can be easily generalized to arbitrary Nb > 0. 



(36) 



(37) 
(38) 

(39a) 
(39b) 
(39c) 



(40) 
(41) 
(42) 



(43a) 

(43b) 
(43c) 



QED 



with entries A /gg. For example, for N B = 3, it equals [y/qwo, y/Qooi, y/Qow, ■ ■ y/q~in] J 



Note that v 7 ^ 



b ) , when expressed in matrix notation, is the column vector 
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4 Q-Embeddings 



In this section, we will review and extend the section of Ref . [3] entitled "Q-Embeddings' : 
A probability matrix P(y\x) is a rectangular (not necessarily square) matrix 
with row index y G S y and column index x G Sx such that P(y\x) > for all x, y, 
and Y2 y P{y\ x ) = 1 f° r an x - A probability matrix is assigned to each node of a CB 
net. 

A unitary matrix A(y, x\x, y) (with rows labelled by y, x and columns by x, y) 
is a q-embedding of probability matrix P(y\x) if 

^2\A(y,x\x,y = 0)| 2 = P(y\x) (44) 

for all possible values of y and x. (the "q" in "q-embedding" stands for "quantum"). 
When considering a q-embedding A(y,x\x, y) of a probability matrix, we will refer 
to y as the focus index, y as the focus-image index or source index, x as the 
parent index, and x as the parent-image index or sink index. We will also refer 
to x and y collectively as ancilla indices. 
Given a QB net N Q , let 



P[(x) 



(45) 



On the right hand side of Eq. (|45p . A(x) is the amplitude of story x, Tq is the set of 
indices of all the nodes of , and L is the set of indices of all leaf (a.k.a. external) 
nodes of J\f Q . We say J\f Q is a q-embedding of CB net J\f c if P[(x)l] defined by 
Eq.fll5]) satisfies 

*)r c ] = E P iWJ. ( 46 ) 

Li 

where L\ C L, and Tc is the set of indices of all nodes of M c . Thus, the probability 
distribution associated with all nodes of M c can be obtained from the probability 
distribution associated with the external nodes of . Ref. [1] gives two examples of 
q-embeddings of CB nets: the two-body scattering net and the Asia net. Ref.[5j gives 
the example of the Quick Medical Reference net. More examples will be given later 
in this paper. 



4.1 Q-Embeddings of Probability Matrices 

In Ref. [3], we showed that any probability matrix has a q-embedding. Our proof 
was constructive and relied on the Gram-Schmidt method. In this section, we will 
give a new proof, again constructive, that relies on multiplexors. The q-embeddings 
constructed in this section, compared with those of Ref. [3], have the advantage that 
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they are already compiled, whereas those obtained in Ref.[l] in general require a 
computer program, "a compiler" , to compile them numerically. 
Eq. (jHD is satisfied by 

A(y,x\y = 0,x) = ^P{^\x) b% . (47) 

When speaking of a parent index x and a focus index y, we will denote number of 
parent bits by NB, par = log 2 (iV £ ) , and number of focus bits by Nbj oc = log 2 (N y ), 
Also, N s>P ar = 2 Nb -^ = N £ , and N S j oc = 2 NB -f°° = Ny. Eq.@ZJ) can be expressed in 
matrix form as follows: 





(y = 0,x) -> 




D°'° 


[A(y,x\y = 0,x)]= | 


D i,o 




D Ny_-lfl 



where, for all y G val(y), D y, ° G m N - xN - are diagonal matrices with entries 

(l> :,A, )r., = JPW)5i . (49) 

By adding more columns to the matrix of Eq. (j4"51) , one can extended it to the following 
square matrix: 





(y,x) -> 






D ' 1 ■ 




[A(y,x\y,x)\ = | 


D i,o 


D 1 ' 1 ■ 






D Ny-l,0 


D N *r x > 1 ■ 


. D Ny-l,Ny-l 



E (51) 

feesooi log 2 % 

For all yi,2/2 G val(y), D yi,V2 G R^*-™^ are diagonal matrices. These diagonal matri- 
ces are chosen so that £/g G M. N ^- xN ^- are unitary matrices such that the first column 

of £/g is given by (C^) y ,o = y = dec(b)). The other columns of the t/j's can 

be chosen at will provided that they make the U^s unitary. According to Claim EJ 
we can choose each £7g to be a chain of multiplexors, in which case [A(y, x\y, x)] is a 
multiplexor of a chain of multiplexors. For example, if log 2 N y = 3 and log 2 = 2, 
then 
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and 



[My>2\y> x )] = 




As mentioned in Section O on multiplexors, Ref. [12] shows how to decompose a mul- 
tiplexor into a SEO. Thus, this particular q-embedding of P(y\x) comes already com- 
piled (decomposed into a SEO). 



4.2 Q-Embeddings of CB nets 

Ref.Jl] describes a method by which, given any CB net M , one can construct a QB 
net which is a q-embedding of M . In this section, we will review this method. 
The method will be used in later sections to construct QB nets for sampling. 

In the previous section, we showed how to construct a q-embedding for any 
probability matrix. Now remember that each node of M c has a probability matrix 
assigned to it. The main step in constructing a q-embedding of Af c is to replace each 
node matrix of M c with a q-embedding of it. 

Before describing our construction method, we need some definitions. We say 
a node m is a marginalizer node if it has a single input arrow and a single output 
arrow. Furthermore, the parent node of m, call it x, has states x — (xi, 22, • • • , x n ), 
where Xi G Sx. for each i G Z l n . Furthermore, for some particular integer i Q G Z l n , 
the set of possible states of m is S m = , and the node matrix of m is P(m = 
m\x = x) = 5(m, x, ). 

Let M c be a CB net for which we want to obtain a q-embedding. Our con- 
struction has two steps, given by Figj3l 

Consider a QB net J\f® which is a q-embedding of a CB net M c . In the 
following sections, we will label some nodes of by an underlined group of symbols, 
such as ax_, followed by an index enclosed in angular brackets, as in ax (4). These 
indices enclosed in angular brackets will be called worldline indices. In N®, a 
sequence of random variables such as ax{l), ax{2), . . . ax{n), where node ax(l) is 
set to zero and qx_(n) is an external node of J\f®, will be called a worldline of ax. 
When using worldline indices, variables like (x, x, y, y) that were used in describing a 
q-embedding of a probability matrix are replaced by: 



(x,x) = (x(i + l),x(i)) , . 

(!/,») = (V<J + 1>,V<J» ' 1 J 



for some integers i and j. 
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(Step 1) Add marginalizer nodes. 

More specifically, replace Af c by a modified CB net M c ' mo d obtained 
as follows. For each node x of J\f c , add a marginalizer node between 
x and every child of x. If x has no children, add a child (with a delta 
function as probability matrix) to it. 

(Step 2) Replace node probability matrices by their q- 
embeddings. Add ancilla nodes. 

More specifically, replace M c mo d by a QB net J\f® obtained as follows. 
For each node of N c mod, except for the marginalizer nodes that were 
added in the previous step, replace its node matrix by a new node 
matrix which is a q-embedding of the original node matrix. Add a new 
node for each ancilla index of each new node matrix. These new nodes 
will be called ancilla nodes (of either the source or sink type) because 
they correspond to ancilla indices. 

Figure 3: Algorithm for constructing a q-embedding of a CB net 



Every QB net N Q can be converted into an equivalent quantum circuit C®. 
To do so, each worldline of J\f® is turned into the time history of one or more qubits. 
(The number of qubits in a worldline is log2 of the number of possible states of node 
(1) of the worldline.) 

In Ref.[l] we consider two CB nets called Two-Body Scattering and Asia. For 
each of these CB nets Af c , we perform the steps of FigJH] to construct a QB net 
Af Q that is a q-embedding of Af c . Ref.[l] gives M c and J\f®, but not an equivalent 
quantum circuit C®. As examples and for the sake of completeness, Appendix IC1 gives 
quantum circuits for Two-Body Scattering and Asia. 



5 Importance Sampling 

In this section, we will propose a method for doing importance sampling of a CB net 
on a quantum computer. The traditional method for doing importance sampling of a 
CB net on a classical computer is reviewed in Appendix |A1 

Consider a CB net whose nodes are labeled in topological order by (x^, x 2 , . . . x^ nds ) 
./'. Assume that E (evidence set) and H (hypotheses set) are disjoint subsets of Zi^ nda , 
with Z^N nda —EUH not necessarily empty. Let X c = Z\^ nda —X for any X C Zi^ nda . 
Assume that we are given the prior evidence (x)e, and the number of samples N sam 
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that we intend to collect. 

Suppose x' is an arbitrary point in val(x). (We'll use the unprimed x, as in 
(x)e, to denote the evidence.) The probability matrices associated with each node 
of our CB net will be denoted by P(x' i \(x') pa u^) for each i G Zi,N nd3 - 111 addition, 
we will assume we are given sampling probability matrices, associated with each 
node of our CB net, denoted by Q(x' i \(x') pa ^)) for each i G Z\^ nds . In all cases, these 
sampling matrices are constrained to satisfy 

Q(x'i\(x%a(i)) = P(x' t \(x') pa{l) ) VieF. (55) 

Two important special cases of importance sampling are rejection sampling and like- 
lihood weighted sampling. For rejection sampling (RS), 

Q(Xi\(x')pa(z)) = P(2-|(x%a(i)) G Z 1>NndB . (56) 

Hence, Q(x') = P(x') for rejection sampling. For likelihood weighted sampling 
(LWS) (a.k.a. likelihood weighting), 

0(x'\(x') ,V> - I P (^K X %«W) Vi G E c 

Hence, Q(x') = Y\i£E c P( x i\( x ')pa(i)) f° r likelihood weighted sampling. 

Under these assumptions, the algorithm for importance sampling of a CB net 
on a quantum computer is given by FigJH (expressed in pseudo-code, pidgin C lan- 
guage). The only difference between the classical algorithm of Fig 19] and the quantum 
one of FigJD is the underlined command. In the quantum case, we use a quantum 
computer instead of a classical one to generate x^ ~ (5(xi|(x (fe) ) pa (i)). To do this, 
we can find a q-embedding of Q(xi\(x^) pa ^). Starting from an a priori known pure 
state of the parents | (x) pa (i) = {x^) pa (i)), one applies the q-embedding to it, and then 
finally one measures Xj. 

For example, suppose for some i, Xi G Bool and (x) pa ^ = (xi,x ) G Bool 2 . 
Denote x« by y. Then a q-embedding A of Q(xi\(x)pa(i)) satisfies 



A(y, x h x \y = 0, x u x ) = VQ(y|xi,x ) S%8% ■ (58) 

Suppose that the a priori known pure state of the parents is \(x) pa ^ = (x[, x' )). If 
we indicate non-zero entries by a plus sign, 
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For all (x) H {W[(x) h \ = 0; } 
W tot = 0; 

For samples k — 1,2, ... , N sam { 
L = l; 

For nodes % = 1, 2, . . . , N nds { 

Use quantum computer to generate x^ from Q(xi|(x^)pa(i)); 
/ /Here, for LWS, Xj^* 1 == Xi when i E E. 
//pa(i) C so (a;( fc ^)p a (j) known at this point, 

if i E E{ 

if Xj i k ) == Xi { 

Q(*i|(*< fc) )p«(i))' 

//Here g = 1 for RS and § = P for LWS. 
}else{/ /LWS never enters here 
go to next k; 

} 

} 

}//i loop (nodes) 

H/[(xW)^] + = L; 

Wtot + = L; 
}//k loop (samples) 
For all (x) H {P((x) H \(x) E ) = } 



Figure 4: Algorithm for importance sampling of CB net on quantum computer. 
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110 






+ 






111 








+ 





\y = 0,x[,x' )(59) 



-> ^ e*Vv®^|j/ = 0,^,4) (60) 
beBool 2 

= e ld ^'o aYi2) \y = 0,x[,x' o ) , (61) 
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for some #g G R. Here the right pointing arrow means that the expression at the 
origin of the arrow can be extended to the expression at the target of the arrow. Note 
that, according to Eq. (|6ip . it is not necessary to perform all elementary operations 
that constitute a decomposition of A. We need only perform one single-qubit rotation 
picked out by the a priori known state of the parents. The final step is to measure y 
on the state of Eq. (l6TI) . without measuring x x and x . 

If Xj or one of its parent nodes has more than two possible states, then (see 
Section 14.11) we can still represent the q-embedding A as a multiplexor of a chain of 
multiplexors. This will give for A |x"j = 0, {x( k ') pa (i)) a chain of multiplexors acting 
on \£i = 0, (ir^)po(i))- The final step is to measure x { , without measuring (x) pa (i)- 

Note that since nodes x_j for j G E are fixed, we may treat them as if each 
had only one possible state. This will reduce the size of the probability matrix 
P(xi\(x) pa (i)), and of its q-embedding A. 

6 Markov Chain Monte Carlo 
6.1 Gibbs Sampling 

In this section, we will propose a method for doing Gibbs sampling of a CB net on 
a quantum computer. The traditional method for doing Gibbs sampling of a CB net 
on a classical computer is reviewed in Appendix IB. 11 

Consider a Markov chain a; —>■ x 1 — > x 2 . . . —>■ x T . Let x* = (x_i,x|, ■ ■ ■ , xjy ) 
for each time t represent a separate copy of a CB net with nodes x^, x|, . . . , xjy- , and 
probability matrices P(x t i \(x t ) pa ^). (The nodes of the CB net x* are not necessarily in 
topological order.) Assume that E (evidence set) and H (hypotheses set) are disjoint 
subsets of Zi^ nds , with Zi t N nds — E\JH not necessarily empty. Let X c = Z\^ nis —X for 
any X C Z\^ nda - Assume that we are given the prior evidence (x)e- All probabilities 
in this section about the Gibbs algorithm will be conditioned implicitly on (x*) e = 
(x)e for all t. The Gibbs algorithm is designed to respect this constraint, by never 
changing the value of (x?)e after it is initially set. 

Suppose that we wish to sweep through all nodes of graph x\ in a fixed 
deterministic order, repeating this all-nodes-sweep N gra times, t will change by one 
every time one node is visited. Thus, the last time T of the Markov chain will 
be N ni isNg ra . Suppose we wish to sweep through (3 copies of the graph x* before 
performing each measurement. Assume that we are given the burn time t\j UTn (0 << 

tburn ^ *^ Tj . 

Under these assumptions, the algorithm for Gibbs sampling of a CB net on a 
quantum computer is given by Figj5] (expressed in pseudo-code, pidgin C language). 
When (3 = 1, the only difference between the classical algorithm of Fig ITT] and the 
quantum one of FigJS] is that the two underlined commands in the quantum algorithm 
replace the i loop (over nodes) in the classical one. In the quantum case, we use a 
quantum computer to generate x t+/3Nnds ~ P(x t+/3Nnds \x t ). To do this, we find a CB 
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For all (x) H {W[(x) h } = 0; } 
W tot = 0; 

Initialize x° to some value x°, subject to (x°)e = (%)e) 
t = 0; 

For graphs g = (3, 2p, 3/?, . . . , N gra { 

Use quantum computer to generate x t+l3Nnda ~ P(x t+l3Nnds \x t ) 
t+ = (3N nds ; 

if t > h urn {/ / << Uurn « N n d s N gra 

W[(x t ) H ] + + ] 
W tot + +; 

} 

}//g loop (graphs) 

For all (x) H {P((x) H \(x) E ) = ™fl; } 



Figure 5: Algorithm for Gibbs sampling of CB net on quantum computer. Nodes of 
CB net x l are visited in a fixed deterministic order. 



net that generates p(x t+l3Nnd3 \x f ) and then we find a q-embedding of that CB net. 

As an example, suppose N nds = 3 and [3 = 2. FigEl shows a CB net that will 
generate p^x t+l3Nnis \x t ) provided that we set 

P(xl) = 5{x% {x% rev ) (62) 

for i G Z\£. Here (xj) prev are the values of x\ obtained previously in the algorithm. 

Note that Figj6] is divided into 7 "time slices" t + j for j 6 Zq$. Eq. (l6"2l) 
gives the probability matrices associated with the nodes of the first time slice. The 
probability matrices associated with the nodes of the other 6 time slices are as follows. 

For j E Z , 5 , 

p(t+i+jit+j — p t+j t+js 

r \ X l<8j l X 2®j> X 3®j) — r x 1(Bj \x 2(!)j ,x 3(S)j V X lej \X 2e>j , X 3eij ) , yVOd) 
P{. X 2®j > ~ : '\ X 2®j) = $( X 2®j ^i X 2®j) ) (63b) 



P( X 3®j 3 \ X 3®j) — $( X 3®j ^i X 39j) ' (63c) 



where © denotes addition mod 3 with 3 and identified^] Here P Q 
node probability of the CB net x l . We can replace P x \ x x by P x \< X ),, DI ,^; 



In C language, x © y = ((x + y)%3 == 0?3 : (x + y)%3), where x, y are non-negative integers. 
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Figure 6: CB net for Gibbs sampling algorithm, for a CB net x* with 3 nodes, sweeping 
through the nodes of x* in a fixed deterministic order, repeating this all-nodes-sweep 
twice. 



if the Markov blanket of x l0J - does not include all nodes in (x){i®j}c. For example, 
when j = 0, we get the conditional probabilities of the nodes of the first time slice: 

P{pu^ \x 2 , X 3 ) = Px 1 \x 2 ,x 3 (Xl~ \ x 2i X 3) > (64a) 

P(x* +1 |4) = 5(4 +1 ,4), (64b) 

p(4 +1 \4) = s(4 + \4). (64c) 

Here P Xl \x 2 ,x 3 is a node probability of the CB net x l . We can replace P Xl \x 2 ,x 3 by 
P £i K^) MS(1) if the Markov blanket of x x does not include all nodes in (x){\y. 
The full probability distribution for the net of Figj6] is given by: 
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P(V rr.t+1 „t+2 t+3 t+4 t+5 t+6\ 




4 +3 ) > , (65) 



where each of the slanted lines represents a Kronecker delta function equating the 
two variables at the two ends of the line. Summing the full probability distribution 
over all nodes except the final ones x* +6 yields: 



P{x 



t+6\ 



E 



E 

-t+3 _t+4 „t+5 
i x 2 ' x 3 



p(x t 1 )p(4 +1 )P( 
p(4 



t+3| t+1 „t+2N 



t+2\ 



^3 > 

pf„t+5| t+3 i+4\ 
l x 3 l x l 5 x 2 / 
*+6| t+4 f+5N 

„t+5 ™t+6N 
2 l x 3 > x l / 
p/„t+6|_.i+6 ~t+6\ 



p(4 

P(x t+% 



(66) 



It is convenient to make the following change of notation: 



1) ^2 
t+3 t+4 



2 J X 3 
X t+6 



X 



t+6 



t+5\ 



(xt, xi X\) 

-t+3 yt+3 X t+3 ) 



{XV 



) A 2 



(67) 



Described more succinctly, what we are doing is replacing x by X and t + j by 
t+ (,7'/3)3 for j e Zo,6? where the division by 3 is "integer division" , with no remainder, 
an operation available in most computer languages. In the new notation, Eq. 0661) 
simplifies to 
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Xt+3 



equivalent ly, 



P{X V ) 

p(xr 3 |x*,x*) 

P{X^\XiX[+Z) 
P(X* +3 |X 1 * +3 ,X*+ 3 ) 
P(X t 1 +6 \X t 2 +3 ,X t + 3 ) 
P(X t 2 +6 \X t + 3 ,X t 1 +6 ) 

P(X* +6 IX* +6 ,X* +6 ) 



p(^) = ^ ppo n 



J=l,2 



P(X* +3i |Xj +3i ,X*' +3 - ? ') 
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3 2 



4 3 2 



.ax\(i> 

• ax^<i> 

■ bx' 2 (i) 
.ax'sO) 

»ax', +l <i) 

iax' 3 +1 (i) 
»bx'3 +l (') 
.axf(l> 
.bx'^i) 
.asf<l) 
,bx!, +2 (l) 

.ax' +2 <i> 
»bx t+2 (i) 

■ ax\ +3 <l) 
,bx^i) 

■ g£ 3 (i) 
,bx^i) 
»ax' 3 +3 (l) 
ibx^(l) 



-.bx^l} 
-.bx^l) 



^.bxf(l) 
-.ax t 2 +5 (i> 

H.bX^l) 



. ax, +6 (l) 
,bxf(l> 
. ax' 3 +6 (l> 
. bx'^l) 



Figure 8: Quantum circuit for QB net of FigJT] 



Following the steps of Figj3l we obtain the QB net FigfTJ a q-embedding of the 
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CB net of FigJHJ In FigJTJ due to lack of space, labels for the original (light-colored, 
not black) nodes have been omitted. These omitted labels can be re-constituted as 
follows. If the original node has incoming arrows Zi,z$, ... ,z n , then the node is 
labeled by the n-tuple (z^,^, ■ ■ ■ ,z n ). 

From the QB net of FigJTJ one easily obtains the equivalent quantum circuit 
shown in FigJHJ Some simple observations about this circuit are: 

• The first time slice t uses only 4 qubits, the intermediate ones use 5 qubits, and 
the final one t + 6 uses 3 qubits. 

• Once the "column of (3)'s" is reached midway into time slice t + j for j e Z 15 , 
the qubits of the previous time slice can be recycled (used again). Also, in each 
time slice, at least one of the six qubits is never used. Thus, we only need 
5 + 5 = 10 qubits for N n d s = 3, or 2(2N n d s — 1) in general. 

• All worldlines except those of the first time slice start at |0). In the first time 
slice, ax\(l) and bx\(l) start at \(xf) prev ) for i = 1,2,3. 

• Nodes (3) may be omitted since they do not change the state of their qubit. 

Let T ext be the set of all external nodes of the QB net of FigJTJ and let Y int be 
the set of all other nodes. Let A be the full amplitude of the QB net. Then 



P(ax t+e (3)\(x% rev ) = 

F ext -{ax t + 6 (3)} 



5> 



(70) 



Iprev j 



By construction, the probability P(ax t+6 (3) | (x f )pr e v) an d the probability P(x t+6 \ (x 
of Eq.(JS2J) should be equal, if we equate ax t+6 {3) and x t+6 . 

Next, we will give the node amplitudes for the nodes of quantum circuit FigJHJ 
We will give node amplitudes only when those amplitudes are non-trivial (i.e., not 
equal to just a delta function, like they are for the (1) and (3) nodes). We will give 
compilations for these non-trivial node amplitudes assuming val(a) e Bool for all 
nodes a of the CB net x*. If some node has more than two possible values, then, 
we increase the number of values of that node to a power of two. Compilation of 
node amplitudes in this case is slightly more complicated than when all nodes have 
only two possible values, but it can still be done using the multiplexor techniques of 
Section HJ 

Nontrivial nodes in initial time slice t: 

1. Nodes (ax%{2), bx\ (2)) and (ax|{2), bx\{2)). These two nodes are analogous. 
Consider the first one for definiteness. One has 



A(ax\Q) ,bx\(2)\ax t 2 (\) — {x^pr^^bx^l) — (x2) prev ) — 
e(axl(2) = ax\{l) = (x*) pre ,)0(6x* (2) = 6ar|<l> = (x*) pre „) " 1 > 

One can extend this A to the identity matrix, if it acts on |os|<i> = (x%) prev ) |&£*<i> = (4) f „„). 
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2. Node ax\(2). One has 



A(ax\(2)\ax\(l) = (x 
e(ax\(2) = ax\{\) = (x 



1 Jprev J 



1 Jprev J 



(72) 



One can extend this A to the identity matrix, if it acts on \ax\ (1) = (x\) prev ). 
Nontrivial nodes straddling time slices t + j and t + j + 1 for j e Z 4 : 



j 


12 3 4 


m 


12 3 12 



1. Nodes (te^ + ^2>,^ + ^2>, te^(4>, te*+/ )el (4>) where- 

and the © denotes mod 3 addition with 3 and identified. These five nodes are 
analogous. Consider j = for definiteness. 

For node (bx^ +1 {2), a^ +1 (2), 6^(4)), b^ 2 (4)), one has 



A(bx t 1 +1 (2),ax t 1 +1 (2),bx t 3 (A),bx t 2 (A)\bx\ +1 (l) = 0,ax* +1 (l) = 0, 6x|(3), te*(3)) = 

(73) 



P, i| , 3 „ 2 (a^ 1 (2)|64(3),^ 2 (3)) £p°>£(g«£f$ 



If we indicate non-zero entries by a plus sign, 



A = 





0000 


0001 


0010 


0011 




(bx t + 1 ,ax t 1 +1 ,bxt i ,bx t 2 )= 0000 


+ 










0001 




+ 








0010 






+ 






0011 








+ 




0100 












0101 












0110 












0111 












1001 












1010 












1011 
1100 
1101 


+ 


+ 








1110 

1111 






+ 


+ 




^(3) n(2) / 2 ® Yl e ' 1 







beBool' 2 

a x (3) n ^ e^ (2) P g (l,0), 



(74) 



(75) 
(76) 



for some Or G 



b&Bool 2 

This choice of A can be compiled using multiplexor methods. 
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2. Nodes (qx t f ^ ) +j (2),qx t f + ^ ) (A)) where ■ 



analogous. Consider j = for deflniteness. 
For node (aXo +1 (2), ax|(4)), one has 



j 


12 3 4 


m 


2 3 12 3 



. These five nodes are 



Thus, 



A(ax t 2 +1 (2),ax t 2 (4:)\ax t 2 +1 (l) = 0,ax*(3)) 
6(a4 +1 (2) = ax* (4) = axi(3)) 



(77) 



A = 





(ax t 2 +1 (l), ax\ <3>) = 
00 


01 




(ai* +1 {2>, ax\(4))= 00 


1 







01 










10 










11 





1 





a x (l)"W/f . 



(78) 



(79) 



This operation is unnecessary except at the end, between time slices t + 5 and 
t + 6. 



3. Nodes (bx^ ) +j (2),^ ) +j (2),qx^ ) (A)) where 

five nodes are analogous. Consider j = for deflniteness. 
For node (bx^ +1 (2), ax^ +1 (2), ax|(4)), one has 



j 


12 3 4 


m 


3 12 3 1 



These 



A(bx t 3 +1 (2),axl +1 (2),ax t 3 (A)\b4 +1 (l) = 0,a4 +1 (l) =0,ax*(3» = 
0(te' 3 +1 (2) = a4 +1 (2) = ax* (4) = ax|(3)) 

Thus, 



(80) 



A = 





(bxl +1 (1) , axl +1 (1) , ax\{S)) = 
000 


001 




(6i3 +1 <2>, ai* +1 {2>, aa;|{4>)= 000 


1 







001 










010 










011 










100 










101 










110 










111 





1 





[a x (2V x (l)]"(°)/f . 



(81) 



(82) 
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Nontrivial nodes straddling time slices t + 5 and t + 6: 

1. Nodes (Wi +6 (2),aXi +5 (4)) and (az 2 +6 (2), az* 2 +5 (4)). These two nodes are anal- 
ogous. Consider the first one for definiteness. One has 



Thus, 



A(ax\ +6 (2), ax\ +b (4)\ax\ +ti {l) = 0,ax? b {3)) = 



J+5 



t+6. 



r-t+5, 



0(ax\ +b (2) 



ax 



t+5 



(4) 



ax\ +b {3}) 



(83) 





(axl +6 (l), ax t + 5 (3)) = 
00 


01 




(oi t + 6 (2), ax*+ 5 (4»= 00 


1 







01 










10 










11 





1 





-> a x (l) n(0) / 2 ® 2 . 
2. Node (ax3 +6 (2),te2 +5 (4),te' 1 +5 (4)). One has 



(84) 



(85) 



A(ax t 3 +b (2),bx t 2 +b (A),bx\ +5 (4)\ax t 3 +b (l) = 0, &4 +5 <3>, te'+ 5 <3)) 
^la,a(«4 +6 (2)|64 +5 (3),^(3) tP^Ll-S! 

If we indicate non-zero entries by a plus sign, 



(86) 





000 


001 


010 


011 




(a4 +6 ,&a;* +B ,6a;* +B ) = 000 


+ 










001 




+ 








010 






+ 






011 








+ 




100 


+ 










101 




+ 








110 






+ 






111 








+ 





beBool 2 

= £ ^^(1,0), 



(87) 



(88) 
(89) 



beBool' 2 



for some #r G R. This choice of A can be compiled using multiplexor methods. 
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6.2 Metropolis-Hastings Sampling 

In this section, we will propose a method for doing Metropolis-Hastings sampling of 
a CB net on a quantum computer. The traditional method for doing Metropolis- 
Hastings sampling of a CB net on a classical computer is reviewed in Appendix IB. 21 
Compare Eq. (11091) for the Gibbs algorithm with Eq. (jl22p for the Metropolis- 
Hastings algorithm. From this comparison we conclude that if in the Gibbs algorithm 
of Figj5l we replace P(x t i +1 \(x t )MB(i)) f° r i E E c by the following, we will be doing 
Metropolis-Hastings. 

Vi 

\ e(4 = tf 1 )[i-E usyiM Q i ( Vi \x t )] 

A Appendix: Importance Sampling 

of CB Nets on a Classical Computer 

In this Appendix, we review the importance sampling algorithm for CB nets on a 
classical computer. 

Consider a CB net whose nodes are labeled in topological order by {x x , x 2 , . . . %N nds ) 
./'. Assume that E (evidence set) and H (hypotheses set) are disjoint subsets of Zi^ nda , 
with Z^N nda —EUH not necessarily empty. Let X c = Z\^ nda —X for any X C Zi^ nda . 
Assume that we are given the prior evidence (x)e, and the number of samples N sam 
that we intend to collect. 

Suppose x' is an arbitrary point in val(x). (We'll use the unprimed x, as in 
(x)e, to denote the evidence.) The probability matrices associated with each node 
of our CB net will be denoted by P(x' i \(x') pa ^)) for each i e Z^ Nnds . In addition, 
we will assume we are given sampling probability matrices, associated with each 
node of our CB net, denoted by Q{x' i \{x') pa ^) for each i G Z^ Nnda . In all cases, these 
sampling matrices are constrained to satisfy 

Q(x' t \(x') pa(l) ) = P&KxVti)) Vi6£ c . (92) 

Two important special cases of importance sampling are rejection sampling and like- 
lihood weighted sampling. For rejection sampling (RS), 

Q(^K^W)) = P«|(x') paW ) V* e z 1>Nnds . (93) 

Hence, Q(x') = P(x') for rejection sampling. For likelihood weighted sampling 
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(LWS) (a.k.a. likelihood weighting), 



Wd^VwJ 1 8{x i ,x' i ) VieE • iy ] 



Hence, Q{x!) = YlieE c P{ x 'i\{ x ')pa,{i)) f° r likelihood weighted sampling. 

Under these assumptions, the importance sampling algorithm is given by Figj9] 
(expressed in pseudo-code, pidgin C language). 



For all (x) H [W[{x) h ] = 0; } 
W tot = 0; 

For samples k — 1,2, ... , N sam { 
L = l- 

For nodes i = 1, 2, . . . , N nds { 

Generate x^ k > from Q(xi\(x^) pa ^)); 
I /Here, for LWS, x^ k ' == Xi when i E E. 
//pa(i) C so (x^)pa(i) known at this point, 

if i E E{ 

if == Xi{ 

j P(Xil(g (fc) )pq(i)) . 

0(**l(*W)p»«)' 
//Here g = 1 for RS and g = P for LWS. 

}else{/ /LWS never enters here 

go to next k; 

} 

} 

}//i loop (nodes) 

W[(x^) H ] + = L- 

W tot + = L; 
}//k loop (samples) 
For all (x) H {P((x) H \(x) E ) = ™fl; } 



Figure 9: Algorithm for importance sampling of CB net on classical computer. 



Claim 4 For the algorithm of Fig\g W ^ o f -> P((z)^|(x) s ) as iV sam -> oo. 
proof: 

Define the likelihood ratio function: 
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for all x' £ val(x). Clearly, 



Q{x')L E {x') = ]J{Q(x'Mx%a(i))}n{P(xl\(x% a(i) )} (96) 
= P(x') . (97) 



For any function g : val(x) — > R, as N sam — * oo, the sample average g(x^) tends to: 
SiW) = J- £ ff (xW) - £ Q(x')5[(x) E , (x')*]^') • (98) 

"sum 

Therefore, 



W[(x) g ] = T^Y, k LE(xM)5[(x) H ,(xW) H ] 
_ Y. x > P(x')S[(x)euh, (x')euh] 

Y,x> P ( x ')$[( x )e, (x')e] 

P((x)euh) 
P((x)e) ■ 



(99) 
(100) 
(101) 



QED 



B Appendix: Markov Chain Monte Carlo 
for CB Nets on a Classical Computer 

In this Appendix, we review two examples (Gibbs and Metropolis-Hastings) of Markov 
Chain Monte Carlo (MCMC) sampling algorithms for CB nets on a classical computer. 
A Markov chain is a CB net of the form 

x° -> x 1 -> x 2 -> . . . -> x T , (102) 

where val(x^) is independent of £ £ ^o,t- 

It's clear from its graph that a Markov chain satisfies 

p(x' + Vy~V--z°) = p(x* + V) , (103) 

i.e, the probability that x t+1 = x t+l at time t + 1 is independent of what happened at 
all previous times except at the immediate past t. The N^t x iV £ t matrix with entries 

called the transition matrix of the Markov chain; we will 
represent it by T. We will assume that T is independent of t (this property of T is 
called time invariance or time homogeneity). 
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Let tt : t>aZ(x') —>■ R be a probability vector ( tt(x) > for all x G val{x^) and 
We say 7r is a stationary distribution of T if 

Tvr = Ti , (104) 

i.e., tt is an eigenvector of T with unit eigenvalue. 

We say tt is a detailed balance of T if T(x'|x)7r(x) is invariant under the 
exchange of x and x'\ that is, 

T(x'\x)tx(x) = T(x\x')ir(x') , (105) 

for all x, x' G t>a/(x*). Detailed balance is tantamount to equilibrium since T(x'|x)7r(x) 
is the probability flux being transmitted from state x* = x to state x* +1 = x' after 
a long time, and T(x\x')tt(x') is that being transmitted in the opposite direction, 
and these two are equal. Hence, it is not surprising that if 7r is a detailed balance of 
T, then tt is a stationary distribution of T. Indeed, summing over x both sides of 
Eq. fll05p proves this. 

A Markov Chain Monte Carlo (MCMC) sampling algorithm is a 
method whereby, given a Markov chain x° — > x 1 — > x 2 — > . . ., we find a set of 
points in val(x?) that is distributed according to the stationary distribution 7r of the 
Markov chain, tt is taken to be the full probability distribution of a CB net. Next we 
discuss two examples of MCMC sampling algorithms: the Gibbs and the Metropolis- 
Hastings sampling algorithms. Actually, the Gibbs algorithm is a special case of the 
Metropolis-Hastings one, but I think it is pedagogically beneficial to discuss the Gibbs 
algorithm first, separately. 

B.l Appendix: Gibbs Sampling 

of CB Nets on a Classical Computer 

In this Appendix, we review the Gibbs sampling algorithm for CB nets on a classical 
computer. 

Consider a Markov chain x° — > x 1 — > x 2 . . . — ► x T . Let x* = (x\,x^, . . . , xf N ) 
for each time t represent a separate copy of a CB net with nodes 2i, x|, • • • , xjv > an d 
probability matrices P(x t i \(x t ) pa ^)). (The nodes of the CB net are not necessarily in 
topological order.) Assume that E (evidence set) and H (hypotheses set) are disjoint 
subsets of Zi t N nds , with Zi^ nds — EUH not necessarily empty. Let X c = Z\^ ncU —X for 
any X C Z\^ nds . Assume that we are given the prior evidence (x)e- All probabilities 
in this section about the Gibbs algorithm will be conditioned implicitly on (x*) e — 
(x)e for all t. The Gibbs algorithm is designed to respect this constraint, by never 
changing the value of (x?)e after it is initially set. Assume that we are given the last 
time T of the Markov chain, and the burn time tbum (0 << tb urn << T). 

Under these assumptions, the Gibbs sampling algorithm is given by FigflQl 
(expressed in pseudo-code, pidgin C language). 
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For all (x) H {W[(x) h \ = 0; } 
W tot = 0; 

Initialize x° to some value x°, subject to (x°)e 
For times t = 0, 1, 2, . . . ,T - 1{ 

Draw i uniformly from Zi^ nda ; 
\iieE{ 



}else{ 

Generate ~ P(x- +1 |(a;*)MB(i)); 

} 

{x t+1 ) {i} c = (x*) {i}c ; 

if t > tburn{/ 1 << tburn « T 



} 



W[(x t+1 ) H \ : 

W tot + +; 



}//t loop (times) 

For all (x) H {P{{x) h \{x) e ) 



Wtot ' J 



Figure 10: Algorithm for Gibbs sampling of CB net on classical computer. Nodes of 
CB net x l are visited at random. 



Claim 5 For the algorithm of Fia Fffy P(gf = x l ) is a stationary distribution of 
P(xf +1 = x t+1 \x f = x 1 ). In other words, 

P(x t+ V)^) =P(x t+1 ) (106) 

x t dval(x^) 

for all x t+1 G val(x^). 
proof: 

One begins by conditioning the transition matrix on the node index i: 

P( x *+! = JL J2 P(x t+1 \x\ i) . (107) 

nds ^ 

Rather than proving Eq. (11061) . we will prove the stronger statement 

P(x t+1 \x\ i)P(x l ) = P(x t+1 ) (108) 

x t 

for all i G Z\^ nis . If P(x t+1 ) is a stationary distribution of P{x t+1 \x t ,i) = T(i) for 
any i, then it is a stationary distribution of any product T(ii)T(i 2 ) • ■ ■ T(i n ), for any 
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sequence %2, ■ ■ ■ , i n of i's. 

Studying the algorithm of Fig{10] carefully, we conclude that 



P(x t+l \x\i) 



Q(i G P c )P(x* +1 |(x') MBW ) + Q(i G E)0 



t+i 



r(^' +1 ){i} c 



For i G E, P(x t+1 \x\i) = 8% , so Eq. (11081) is clearly satisfied. For % G P c , 



(109) 



s £ j P(x t+1 \x t ,i)P(x t 



^P(x* +1 |(x t+1 ) {l}c )P((x t+1 ) Wc ,a;*) 



(110) 
(111) 



P(^ +1 |(x t+1 ) Wc )P((x t+1 ) W c) 
P(x m ) . 



(112) 
(113) 



QED 

Rather than choosing nodes of the graph x* at random, one can sweep through 
all of them, in a fixed deterministic order, repeating this all-nodes-sweep N gra times. 
Hence, we can replace the algorithm of Fig{10] by the one of FigJTTl 

The sample of points in val{x?) generated by this "nodes in fixed order" algo- 
rithm isn't time invariant for t differences At = 1, but is time invariant for At = N n d s - 

B.2 Appendix: Metropolis-Hastings Sampling 
of CB Nets on a Classical Computer 

In this Appendix, we review the Metropolis-Hastings sampling algorithm for CB nets 
on a classical computer. 

Consider a Markov chain 1 . Let 

for each time t represent a separate copy of a CB net with nodes x* , x^, . . . , x^ N . and 
probability matrices P(x t i \(x t ) pa ^). (The nodes of the CB net x* are not necessarily in 
topological order.) Assume that E (evidence set) and H (hypotheses set) are disjoint 
subsets of Zi t N nds , with Zi t N nds — EUH not necessarily empty. Let X c = Z\^ ncU —X for 
any X C Z\^ nds . Assume that we are given the prior evidence (x)e- All probabilities 
in this section about the Metropolis-Hastings algorithm will be conditioned implicitly 
on (x*) e — {x)e for all t. The Metropolis-Hastings algorithm is designed to respect 
this constraint, by never changing the value of (x*)^ after it is initially set. Assume 
that we are given the last time T of the Markov chain, the burn time t^um (0 << 
tbum « T), and sampling probability distributions Qi{yi\x\, (x^mbu)) where 
i G Z ljNnds , yi G val(xj), x l G val(x^). 

Under these assumptions, the Metropolis-Hastings sampling algorithm is given 
by FigJT2] (expressed in pseudo-code, pidgin C language). 
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For all (x) H {W[{x) h \ = 0; } 
W tot = 0; 

Initialize x° to some value x°, subject to (x°)e = (x)e', 
t = 0; 

For graphs g = 1, 2, 3, . . . , N gra { 
For nodes % = 1, 2, . . . , N nds {; 
if i G E{ 

X^ X\^ 

}else{ 

Generate ~ P(x t i ^ 1 \(x t ) M B{i))\ 

} 

(% t+1 ){iY = 

t + +; 
}/ ji loop (nodes) 

if t > t burn {/ 1 << tfe urn << N nds N gra 
W^x^h] + +; 

w tot + +; 

} 

}//# !oop (graphs) 

For all (x) H {P((x) H \(x) E ) = ™fl; } 



Figure 11: Algorithm for Gibbs sampling of CB net on classical computer. Nodes of 
CB net x l are visited in a fixed deterministic order. 



Claim 6 For the algorithm of Fig [T^ P(x* = x') is a stationary distribution of 
P(x* +1 = x t+l \xf = x*). In other words, 

J2 P(x t+ V)^V) = P(x t+1 ) (114) 

for all x t+1 G val(x^). 
proof: 

One begins by conditioning the transition matrix on the node index i: 

P(x t+1 \x t ) = -!— ^P(x t+ V,0 • (115) 

nds ^ 

Rather than proving Eq. (11141) . we will prove the stronger statement 

^2P(x t+1 \x\i)P{x t ) = P(x t+1 ) (116) 
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For all (x) H [W[{x) h ] = 0; } 
W tot = 0; 

Initialize x° to some value x°, subject to (x°)e = (x)e', 
For times t = 0, 1, 2, . . . , T - 1{ 

Draw % uniformly from Zi^ nds ] 

\iieE{ 

- 

}else{ 

Generate y { ~ Qi{yi\x\, (i%B(i)); 

Draw uniformly from the interval [0, 1]; 

n _ rnly. /l Q»( a 'l?/».( :ct )MB(»))-Pfa»l(^ t )MB(o) 1 . 

a, mm Q i ( w | a> t ) ( a t) Mfl(i) ) P ( a> t| (a t) JlfBW ) j , 

if («j < = ^; } else = x-; } 

/ /if = P, then a» = min(l, 1) = 1, and get Gibbs 

} 

(x t+1 ) {i y = (a;*) {i} c; 

if t > t& urn {/ 1 << tburn « T 

W[(x t+1 ) H ] + +; 
W tot + +; 

} 

}//t loop (times) 

For all (x) H {P((x) H \(x) E ) = } 



Figure 12: Algorithm for Metropolis-Hastings sampling of CB net on classical com- 
puter. 



for all i G Zi^ nds . If -P(x <+1 ) is a stationary distribution of P{x t+1 \x t , i) = T(i) for 
any i, then it is a stationary distribution of any product T(ii)T(i 2 ) ■ • ■ T(i n ), for any 
sequence ii,i 2 , . . . ,i n of i's. 

Studying the algorithm of Fig{12] carefully, we conclude the following. For 
i G E, P(x t+1 \x t ,i) = 5®t +1 , so Eq. (11161) is clearly satisfied. For i G E c , 

P{x t+1 \x\i) = J2 [ du i P{x t+1 \x\y i ,u h i)P{y l \x\i)P{u^{) . (117) 
Vi Jo 

Eq. (lll7p comes from the CB net of FigJTBl The 3 probabilities occurring on the right 
hand side of Eq. (11171) are given by 

P(x^\x\ y u u u i) = 5^ }c 5 ^<^ 6 ^ , (118a) 
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Figure 13: CB net connecting the random variables used in the Metropolis-Hastings 
algorithm. 



P{Vi\x ,i) = Q l (y i \x i , (x )mb({)) 



(118b) 



and 



P(Ui\i) = 1 . 

{u.i is a continuous random variable.) 

One can "sum over" the node Ui of Fig 



(118c) 



duiP{x t+1 \x\y h u h i) =6^r + • (H9) 



Define 



One can also sum over the node yi of FigJTBI 



(120) 



Note that 



P(x t+l \x\i) = ^& ^101 + ^(1-0,) g i (y i |4,(x*) MB (i))(121) 



€K } < I ^ + V) + CM 1 - £g<(wi**)] I • ( 122 ) 



P(s*) = P(xf|(x*) {i}c )P((x*) {i}c ) 
= P(x*|(x') Mfl (i))^((^) W 



(123) 
(124) 



Hence, 
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OU,,|x )P(x ) mini 1 . Qi(8l ,4 ( «t )MBw) p ((B t| ((e *) Mflw) |0.W.I*i.(»' W))^ ) 



Qi(yi\x\,{ xt )M B(i)) p (?\\( xt ') M B(i)), 
Qi(z-|^,(z t ) M B(o) P (Kl( x ' t )M,B(i)) 



p((x*) {l}c ) . (125) 



From Eq. fll25p . we see that Q(?/i|x*)P(x*) is invariant under the exchange of yi and 
x-. In other words, P(x t ) is a detailed balance of Q(y i \x t ) under the exchange of yi 
and x\. However, J2 Vi Q(Ui\ xt ) 1> so Qd/il 37 *) is n °t a probability distribution in y^. 
Now we have 




= T!+T 2 + T 3 , 
where 

Ti = J2€K^ x ^ lxt)p{xt) (127) 

= ^g i (xr 1 |x*,(^ +1 ) W c)P(^,(x t + 1 ) W c) (128) 

= £Q;(*f + \ (129) 

= ^g i (x^*+ 1 )P(x t+1 ) , (130) 

(To go from Eq. (11281) to Eq. (11291) . we used the fact that P{x t ) is a detailed balance 
ofQ^lx*).) 

T 2 = ^^; +1 P(x*)=P(x* +1 ), (131) 



and 



T 3 = (132) 

Vi 

= -Ti . (134) 
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QED 

As we pointed out for the Gibbs algorithm in Appendix IB.ll rather than 
choosing nodes of the graph x l at random, one can sweep through all the nodes of 
the graph in a fixed order, repeating this sweep N gra times. Hence, we can replace 
the algorithm of Fig{T2]by an alternative one. We leave the details to the reader. 

There are several important special cases of the algorithm of FigIT2l 

1. When Qi{yi\x\, (^*) M B(i)) = PiViKz^MBii)), we get a* = min(l, 1) = 1, yielding 
the Gibbs algorithm discussed in Appendix IB.ll 

2. When N n d s = 1, there is no need for i subscripts in x\ or There is also no 
possibility of evidence. Hence, the algorithm of Fig{12] simplifies to the one in 
FiglH 



For all (x) H {W[(x) h ) = 0; } 
W tot = 0; 

Initialize x° to some value x°; 
For times t = 0, 1, 2, . . . , T - 1{ 

Generate y ~ Q(y\x t ); 

Draw u uniformly from the interval [0, 1]; 

a = min /l Q{Av)P(y) \ . 
a mmy, Q(ylxt)p(xt) j , 

if (u < a){x t+1 = y; } else {x t+1 = x t ; } 

if t > tburn{/ / << tburn «T 

W[(x t+l ) H ] + +; 

W tot + +; 

} 

}/ jt loop (times) 

For all (x) H {P((x) H ) = ™fl; } 



Figure 14: Algorithm for Metropolis-Hastings sampling of CB net on classical com- 
puter. Special case where N n ds = 1- 



3. If Qii^xWy^ (x')mb(j)) is invariant under the exchange of x\ and y iy then 



a,- = mm 



l,^fHl • (135) 



" 1 1 



In particular, if N nds = 1, 
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This is called the Metropolis sampling algorithm. It was invented prior to 
the Metropolis-Hastings one. 

The algorithm of Fig{12] and the proof of Claim [6] are both fairly complicated. 
One wonders, why do they work? What logic motivated the invention of the algo- 
rithm? Here is an intuitive explanation of what is going on. Define 

T{x\ <- yi) = Qi(a%\yi, (x*) MB(i) )P(?/ i |(x*) MBW ) . (137) 

Define T{%ji <— xf) as the expression on the right hand side of Eq. (11371) . but x\ and yi 
exchanged. T{x\ <— Vi) is the probability flux flowing from state yi to state x\, and 
T{i)i <— xj) is the flux in the opposite direction. Now a iy often called the "acceptance 
probability" , can be expressed as 

• J\ «- Vi)\ f,OQ\ 

ai = mm V H^W) S • (138) 

Note that this definition of a, puts it in the interval [0, 1], as required for a probability. 
Recall that the algorithm defines: 

x\ +1 = yiQ(ui < oti) + xlQ(ui > a<) . (139) 

Thus, 

1. If T[x\ <— yi) << T{yi <— xf), then a { = T[x\ <— y i )/J r {y i <— xf) « 1. In this 
case, we 

(a) Set = yi (i.e., accept the new value), doing this infrequently, a« of the 
time. 

(b) Set x 1 ^ 1 = x\ (i.e., keep the old value), doing this frequently, 1 — a« of the 
time. 

2. If T{x\ <— y^) > T{^i ±— xf), then di = 1. In this case, we always set = yi 
(i.e., accept the new value). 

Most of the time (except in case [Taj, we are trying to "buck (counteract) the trend" 
that state x\ is either gaining or loosing weight. We don't buck the trend always, 
because we want to allow a small probability of escaping local minima. 



C Appendix: Quantum Circuits for 



Two Examples of Ref. |4 

FigsH5] and [16] are quantum circuits for the two-body scattering and Asia nets, re- 
spectively, that are discussed in Ref. [3]. 
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Figure 15: Quantum circuit for two body scattering QB net of Ref.[l] 
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Figure 16: Quantum circuit for Asia QB net of Ref.[l] 
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